Empirical Risk Minimization

We use empirical data to estimate true distribution of the data.

Formula

$$ \begin{aligned} w^{*} &= \arg\min_{w} \mathbb{E}_{p_{\text{true}(\underline{x}, y)}}[\ell( h_{\underline{w}}(\underline{x}), y)] &\text{[True Risk]} \\ &\approx \arg\min_{w} \frac{1}{N} \sum_{i=1}^{N} \ell(p_i, y_i) &\text{[Empirical Risk]} \\ \end{aligned} $$

Nature

$$ \begin{aligned} \mathbb{E}_{p_{\text{true}(\underline{x}, y)}}[\ell( h_{\underline{w}}(\underline{x}), y)] &> \frac{1}{N} \sum_{i=1}^{N} \ell(p_i, y_i) \end{aligned} $$

"Generalization Error" > "Training Error"

by Jon